Supplementary Material for “RGB-Infrared Cross-Modality Person Re-Identification”
نویسندگان
چکیده
This supplementary material accompanies the paper “RGB-Infrared Cross-Modality Person Re-Identification”. It includes more details of Section 4, as well as extra evaluations of our proposed deep zero-padding method. 1. Details of Counting Domain-Specific Nodes In the third paragraph of Section 4.2 in the main manuscript, we quantify the number of domain-specific nodes in the trained network in our experiments. As defined in Equation (3) in Section 3 in the main manuscript, the categorization of node types is rather strict. In the l-th layer, let η i denote the i-th node and fout(x , i, l) denote the output of η i given the network input x. Let x d1 and x (0) d2 be inputs of the whole network of domain1 and domain2, respectively. The type of node η i is defined by type(η (l) i ) = domain1− specific, fout(x d2 , i, l) ≡ 0 domain2− specific, fout(x d1 , i, l) ≡ 0 shared, otherwise. (1) Since the identity sign is used here, the categorization condition is too strict in applications. So we relax the categorization condition for counting towards domain-specific nodes in application by setting a threshold T . The relaxed definition of node type is formulated as follows: for all x d1 and x d2 in our experiments, type(η (l) i ) = domain1− specific, fout(x d2 , i, l) < T and fout(x (0) d1 , i, l) > T domain2− specific, fout(x d1 , i, l) < T and fout(x (0) d2 , i, l) > T shared, otherwise. (2) Because the scales of responses on feature maps differ from layer to layer, we set T = α std(x i ), where α is a proportion coefficient, x i is the output value of the i-th node in the l-th layer and std(·) is the standard deviation function. For an image channel in our experiments, we compute the average of all values in the feature map as the output of the node. We set α = 0.01 and α = 0.05 for strict and loose categorizations, respectively. The relation between the proportion of domain-specific nodes and layer depth is shown in Figure S1. Both total proportions and respective proportions of two domains are shown. With strict threshold, domain-specific nodes mainly exist in the first three layers. With loose threshold, domain-specific nodes mainly exist in the first five layers. In both cases, the network can learn more domain-specific nodes using deep zero-padding. When the threshold is loosened, the proportion of domain-specific nodes increases when using deep zero-padding, but keeps nearly unchanged when using the inputs without zero-padding. 2. Evaluation on Using Different Networks Our deep model is based on ResNet [1] as illustrated in Section 5 in the main manuscript. Deep zero-padding has shown effectiveness on ResNet-6 in our experiments. To verify whether deep zero-padding can also work with other one-stream networks, we also evaluated our method on popular architectures AlexNet [2] and VGG-16 [3]. The results are reported in Table S1. Generally, using deep zero-padding can improve the performance in most cases for all evaluated network architectures. The improvement is especially evident for ResNet-6.
منابع مشابه
Person Depth ReID: Robust Person Re-identification with Commodity Depth Sensors
This work targets person re-identification (ReID) from depth sensors such as Kinect. Since depth is invariant to illumination and less sensitive than color to day-by-day appearance changes, a natural question is whether depth is an effective modality for Person ReID, especially in scenarios where individuals wear different colored clothes or over a period of several months. We explore the use o...
متن کاملVolume-based Human Re-identification with RGB-D Cameras
This paper presents an RGB-D based human re-identification approach using novel biometrics features from the body’s volume. Existing work based on RGB images or skeleton features have some limitations for realworld robotic applications, most notably in dealing with occlusions and orientation of the user. Here, we propose novel features that allow performing re-identification when the person is ...
متن کاملAlgorithms for People Re-identification from Rgb-d Videos Exploiting Skeletal Information
In this thesis, a novel methodology to face the people re-identification problem is proposed. Re-identification is a complex research topic in Computer Vision representing a fundamental issue, especially for intelligent video surveillance applications. Its goal is to determine the occurrences of the same person in different video sequences or images, usually by choosing from a high number of ca...
متن کاملOne-Shot Person Re-identification with a Consumer Depth Camera
In this chapter, we propose a comparison between two techniques for oneshot person re-identification from soft biometric cues. One is based upon a descriptor composed of features provided by a skeleton estimation algorithm; the other compares body shapes in terms of whole point clouds. This second approach relies on a novel technique we propose to warp the subject’s point cloud to a standard po...
متن کاملLearning Efficient Image Representation for Person Re-Identification
Color names based image representation is successfully used in person re-identification, due to the advantages of being compact, intuitively understandable as well as being robust to photometric variance. However, there exists the diversity between underlying distribution of color names’ RGB values and that of image pixels’ RGB values, which may lead to inaccuracy when directly comparing them i...
متن کامل